Classification into Readability Levels Implementation and Evaluation

نویسنده

  • Patrik Larsson
چکیده

The use for a readability classification model is mainly as an integrated part of an information retrieval system. By matching the user’s demands of readability to the documents with the corresponding readability, the classification model can further improve the results of, for example, a search engine. This thesis presents a new solution for classification into readability levels for Swedish. The results from the thesis are a number of classification models. The models were induced by training a Support Vector Machines classifier on features that are established by previous research as good measurements of readability. The features were extracted from a corpus annotated with three readability levels. Natural Language Processing tools for tagging and parsing were used to analyze the corpus and enable the extraction of the features from the corpus. Empirical testings of different feature combinations were performed to optimize the classification model. The classification models render a good and stable classification. The best model obtained a precision score of 90.21% and a recall score of 89.56% on the test-set, which is equal to a F-score of 89.88.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Cohesive Readability of Expository Texts and Reading Comprehension Performance: Iranian EFL students of Different Proficiency Levels in Focus

Abstract The present study is an attempt to investigate the relationship between cohesive readability of expository texts and reading comprehension in EFL students with different proficiency levels. One hundred students formed the participant of this study. They were undergraduate students majoring in English at University of Isfahan. To collect the relevant data, participants were divide...

متن کامل

Cohesive Readability of Expository Texts and Reading Comprehension Performance: Iranian EFL students of Different Proficiency Levels in Focus

Abstract The present study is an attempt to investigate the relationship between cohesive readability of expository texts and reading comprehension in EFL students with different proficiency levels. One hundred students formed the participant of this study. They were undergraduate students majoring in English at University of Isfahan. To collect the relevant data, participants were divide...

متن کامل

Insights from Russian second language readability classification: complexity-dependent training requirements, and feature evaluation of multiple categories

I investigate Russian second language readability assessment using a machine-learning approach with a range of lexical, morphological, syntactic, and discourse features. Testing the model with a new collection of Russian L2 readability corpora achieves an F-score of 0.671 and adjacent accuracy 0.919 on a 6-level classification task. Information gain and feature subset evaluation shows that morp...

متن کامل

Designing and Evaluating Patient Education Pamphlets based on Readability Indexes and Comparison with Literacy Levels of Society

Background: Hundreds of patient education materials i.e. pamphlets are annually published in healthcare systems following their design, correction, and revision. Aim: to design and evaluate patient education pamphlets based on readability indexes and their comparison with literacy level in society. Method: The average literacy level among 500 patients admitted to two training hospitals in Bojnu...

متن کامل

EFL Textbook Evaluation: An Analysis of Readability and Vocabulary Profiler of Four Corners Book Series

This study aimed to investigate whether there is any significant relationship between the readability and vocabulary profile including the most frequent words (K1 words) and academic word list (AWL) of reading passages of Four Corners series which were EFL textbooks. To determine the readability of the texts, the Flesch–Kincaid (1975) readability test was used, while the texts' academic word li...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2006